Efficiency-Quality Tradeoffs for Vector Score Aggregation
نویسندگان
چکیده
Finding the ` nearest neighbors to a query in a vector space is an important primitive in text and image retrieval. Here we study an extension of this problem with applications to XML and image retrieval: we have multiple vector spaces, and the query places a weight on each space. Match scores from the spaces are weighted by these weights to determine the overall match between each record and the query; this is a case of score aggregation. We study approximation algorithms that use a small fraction of the computation of exhaustive search through all records, while returning nearly the best matches. We focus on the tradeoff between the computation and the quality of the results. We develop two approaches to retrieval from such multiple vector spaces. The first is inspired by resource allocation. The second, inspired by computational geometry, combines the multiple vector spaces together with all possible query weights into a single larger space. While mathematically elegant, this abstraction is intractable for implementation. We therefore devise an approximation of this combined space. Experiments show that all our approaches (to varying extents) enable retrieval quality comparable to exhaustive search, while avoiding its heavy computational cost. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004 1 Overview: score aggregation We have n records E = {e1, e2, . . . , en} and s sources of evidence. For 1 ≤ i ≤ s, we have a source score σi(ej) from source i for record ej. Additionally, we have a positive real weight wi for each of the s sources. For a specified positive integer `, we seek the ` records of highest aggregate score defined as
منابع مشابه
Axiomatic and Computational Aspects of Scoring Allocation Rules for Indivisible Goods
We define a family of rules for dividing m indivisible goods among agents, parameterized by a scoring vector and a social welfare aggregation function. We assume that agents’ preferences over sets of goods are additive, but that the input is ordinal: each agent simply ranks single goods. Similarly to (positional) scoring rules in voting, a scoring vector s= (s1, . . . ,sm) consists of m nonincr...
متن کاملScoring Rules for the Allocation of Indivisible Goods
We define a family of rules for dividing m indivisible goods among agents, parameterized by a scoring vector and a social welfare aggregation function. We assume that agents’ preferences over sets of goods are additive, but that the input is ordinal: each agent simply ranks single goods. Similarly to (positional) scoring rules in voting, a scoring vector s = (s1, . . . ,sm) consists of m noninc...
متن کاملScoring Rules for the Allocation of Indivisible Goods1
We define a family of rules for dividing m indivisible goods among agents, parameterized by a scoring vector and a social welfare aggregation function. We assume that agents’ preferences over sets of goods are additive, but that the input is ordinal: each agent simply ranks single goods. Similarly to (positional) scoring rules in voting, a scoring vector s= (s1, . . . ,sm) consists of m nonincr...
متن کاملPrivacy and Efficiency Tradeoffs for Multiword Top K Search with Linear Additive Rank Scoring
This paper proposes a private ranking scheme with linear additive scoring for efficient top K keyword search on modest-sized cloud datasets. This scheme strikes for tradeoffs between privacy and efficiency by proposing single-round client-server collaboration with server-side partial ranking based on blinded feature weights with random masks. Client-side preprocessing includes query decompositi...
متن کاملUsing Imperialist competitive algorithm optimization in multi-response nonlinear programming
The quality of manufactured products is characterized by many controllable quality factors. These factors should be optimized to reach high quality products. In this paper we try to find the controllable factors levels with minimum deviation from the target and with a least variation. To solve the problem a simple aggregation function is used to aggregate the multiple responses functions then a...
متن کامل